In 2012, we placed 96 dataloggers recording temperature and humidity across Table Mt. and Silvermine. The loggers were stratified across gradients of elevation, slope, exposure and hillslope position (hilltop, valleybottom), and distance from both False Bay and the Atlantic. In previous analyses, the data has been analyzed to extract diurnal Tmin, Tmax, VPmax (maximum daily vapor pressure), and RHsat.hrs (hours per day with relative humidity above 95%).

All of these analyses and the summary data are available at: https://github.com/dackerly/table-mt-microclimate-2012.git

General summary statistics about the sites and daily weather values can be found in ‘manuscript-summary-stats.r’.

The goal of these analyses is to examine spatial variation in daily weather values, running models for each day through 2012. The initial question is whether the important spatial variables vary with season, but as shown below a key result is enormous day to day variation in the ability to explain spatial variation in weather variables. I’m still searching for explanations in regional weather or other factors to understand why the models differ so much from day to day.

#----------------#
# LOAD LIBRARIES #
#----------------#

rm(list=ls())
library(raster)
## Loading required package: sp
library(RColorBrewer)

#----------------#
# LOAD DATA      #
#----------------#

‘meta’ contains metadata on the 96 sites. ‘topo10’ and ‘topo30’ have a series of topographic features sampled from 10m and 30m dem rasters, respectively, based on lat/long positions of each site. SiteID is the unique name for each site.

Some of the variables in ‘meta’: * elevation, slope, and aspect: measured in the field - these are generally not used, and instead we use a consistent set of values taken from the 10m dem. * use4Analyses, use4ClimateSummaries: sites excluded due to incomplete data (see ms for criteria)

Variables in topo10 and topo30: * d2at, d2fb, d2cs: distance to atlantic, distance to false bay, distance to coast = min(d2at,d2fb) * rad080, rad172, rad355: solar radiation at summer and winter solstice, and equinox. Not generally used as we have diurnal radiation at each site * thl0, thl315, thl337: Topographic heat load as defined by McCune and Keon 2002 JVS, with the axis for calculation of the ‘folded aspect’ (and thus the maximum heat load) at 0, 337 or 315 degrees. The logic is that in the S Hem, NW facing slopes would be hotter than N facing, due to higher afternoon temperatures. * plow050, 125, 250, 500: Percent lower pixels - the percent of pixels surrounding a focal pixel with a lower elevation, in radii of 50, 125, 250 or 500 m. A measure of potential for cold air to flow away from a spot. Higher values indicate hilltops * tpi050, 125, 250, 500: Elevation of a focal pixel - mean elevation in the specified radius. Another measure of hillslope position, with positive values for hilltops, negative for valley bottoms.

meta <- read.csv('data/csv_masters/location_meta.csv',as.is=T)
dim(meta)
## [1] 96 27
head(meta)
##       domain siteID   logger UTM.east UTM.north   LON_DD    LAT_DD
## 1 SILVERMINE   CHK1 10008095   256907   6228779 18.36633 -34.05353
## 2 SILVERMINE   CHK2 10006349   257070   6229153 18.36820 -34.05020
## 3 SILVERMINE   CHK3 10008094   257271   6228903 18.37031 -34.05250
## 4 SILVERMINE   CHK4 10008096   257420   6229224 18.37201 -34.04964
## 5 SILVERMINE   CHK5 10008093   258353   6229582 18.38221 -34.04663
## 6 SILVERMINE   CHK6 10006348   258397   6229772 18.38274 -34.04493
##   elevation slope_deg aspect_deg magn.true.aspect use4Analyses
## 1       120        19        265                M            1
## 2       326         8        350                M            1
## 3       325        32        240                M            1
## 4       367        26        355                M            1
## 5       436        35         15                M            1
## 6       455        33        280                M            1
##   use4ClimateSummaries route CollSeq series deployed.as deployXssn
## 1                    1    CB       1     83    10008095          0
## 2                    1    CB       2     84    10006349          0
## 3                    1    CB       3     85    10008094          0
## 4                    1    CB       4     86    10008096          0
## 5                    1    CB       5     87    10008093          0
## 6                    1    CB       6     88    10006348          0
##   start.date start.hour end.date end.hour log.max.d VegSurveyPlot
## 1   11/10/11         18 12/31/13       23       150   111110-A-09
## 2   11/10/11         18 12/31/13       23       150   111110-A-07
## 3   11/10/11         18 12/31/13       23       150   111110-A-08
## 4   11/10/11         18 12/31/13       23       150   111110-A-06
## 5   11/10/11         18 12/31/13       23       150   111110-A-04
## 6   11/10/11         18 12/31/13       23       150   111110-A-03
##   VegSurveyDate distOnTransect Orig.siteID
## 1      11/10/11             NA            
## 2      11/10/11             NA            
## 3      11/10/11             NA            
## 4      11/10/11             NA            
## 5      11/10/11             NA            
## 6      11/10/11             NA
topo10 <- read.csv('data/csv_masters/topo10.csv',as.is=T)
dim(topo10)
## [1] 96 22
names(topo10)
##  [1] "siteID"    "domain"    "elevation" "slope"     "aspect"   
##  [6] "d2at"      "d2fb"      "d2cs"      "rad080"    "rad172"   
## [11] "rad355"    "thl0"      "thl315"    "thl337"    "plow050"  
## [16] "plow125"   "plow250"   "plow500"   "tpi050"    "tpi125"   
## [21] "tpi250"    "tpi500"
topo30 <- read.csv('data/csv_masters/topo30.csv',as.is=T)
dim(topo30)
## [1] 96 21
names(topo30)
##  [1] "siteID"    "domain"    "elevation" "slope"     "d2at"     
##  [6] "d2fb"      "d2cs"      "rad080"    "rad172"    "rad355"   
## [11] "thl0"      "thl315"    "thl337"    "plow050"   "plow125"  
## [16] "plow250"   "plow500"   "tpi050"    "tpi125"    "tpi250"   
## [21] "tpi500"

‘dw’ has all daily weather values for all sites and days. The dlySummary contains summaries of weather data on each day - average weather across sites. dlySol has diurnal radiation at each site for each day of the year, from Adam Wilson’s analyses on a 30m dem.

dw <- read.csv('data/csv_masters/2012daily.csv',as.is=T)
head(dw)
##   doy11 year month day   Tmin Tmin.time   Tmax Tmax.time   Tmean  RHmin
## 1   366 2012     1   1 14.936  6.166667 25.210  17.16667 20.0730 71.361
## 2   366 2012     1   1 13.762  6.166667 27.161  15.00000 20.4615 78.299
## 3   366 2012     1   1 13.233  6.000000 22.776  17.33333 18.0045 83.567
## 4   366 2012     1   1 13.546  6.000000 25.647  15.33333 19.5965 81.918
## 5   366 2012     1   1 12.413  6.000000 22.896  13.83333 17.6545 91.167
## 6   366 2012     1   1 12.727  6.000000 27.014  14.00000 19.8705 84.421
##    RHmax    VPmax RHsat.hrs     date siteID doy
## 1 77.590 2.239716         0 2012-1-1   CHK1   1
## 2 82.556 2.267693         0 2012-1-1   CHK2   1
## 3 85.799 2.215489         0 2012-1-1   CHK3   1
## 4 83.437 2.133584         0 2012-1-1   CHK4   1
## 5 87.839 2.011852         0 2012-1-1   CHK5   1
## 6 87.179 2.199438         0 2012-1-1   CHK6   1
dlySummary <- read.csv('data/csv_outfiles/dlySummary.csv',as.is=T)
head(dlySummary)
##   doy year month day       date     Tmin     Tmax    Tmean   dtRange
## 1   1 2012     1   1 2012-01-01 11.39483 21.62492 16.50988 10.230091
## 2   2 2012     1   2 2012-01-02 14.97574 19.01598 16.99586  4.040239
## 3   3 2012     1   3 2012-01-03 14.75741 23.10118 18.92930  8.343773
## 4   4 2012     1   4 2012-01-04 10.78530 23.34831 17.06680 12.563011
## 5   5 2012     1   5 2012-01-05 13.78350 25.12926 19.45638 11.345761
## 6   6 2012     1   6 2012-01-06 14.36026 27.59257 20.97641 13.232307
##      RHmin    RHmax    VPmax RHsat.hrs sommax sommin Kps.mean_hPa Kppt_mm
## 1 83.64332 92.68127 1.920619  5.545455     16     15        997.0     0.6
## 2 52.31199 99.71526 1.937674 19.000000     15     14        995.5    11.6
## 3 52.63628 98.78947 1.702166 15.679924      7      7        997.2     0.2
## 4 39.66210 94.06428 1.792897  4.657197     11     11        998.2     0.0
## 5 28.20923 95.26178 1.872311  7.526515     11      7        999.6     0.0
## 6 51.95174 85.88167 2.043691  1.242424     11      8        999.4     0.0
##   Kwspd.mean_m.s Kwspd.max_m.s
## 1       2.991667           5.6
## 2       2.504167           5.4
## 3       2.279167           4.7
## 4       2.304167           3.1
## 5       1.975000           2.7
## 6       2.170833           4.8
dlySol <- readRDS('data/Rdata/dlySol.Rdata')
dlySol[1:6,1:6]
##      rad_tot_001 rad_tot_002 rad_tot_003 rad_tot_004 rad_tot_005
## CHK1    8663.728    8652.467    8640.090    8626.597    8611.990
## CHK2    9244.997    9240.468    9235.409    9229.817    9223.687
## CHK3    7737.912    7724.081    7708.918    7692.427    7674.611
## CHK4    8369.773    8371.854    8374.001    8376.205    8378.459
## CHK5    8767.362    8767.380    8767.277    8767.045    8766.678
## CHK6    8059.166    8050.924    8041.838    8031.906    8021.127
##      rad_tot_006
## CHK1    8596.270
## CHK2    9217.014
## CHK3    7629.731
## CHK4    8380.751
## CHK5    8766.166
## CHK6    8009.500

Before fitting any spatial models of weather variation, we want to check for correlations and spatial autocorrelation of predictors. Based on the topo10 variables, here are all variables with pairwise correlations R2 > 0.5. Commented code can be run to see more detail. As expected, radiation load variables are mostly correlated, and hillslope position variables are correlated. Below, we run alternative models swapping these in and out, but not using them in the same model.

# cor(topo10[,-c(1:2)],use='pair')
# pairs(topo10[,-c(1:2)])
# names(topo10)
# 
## find correlations > 0.5
i <- 3; j <- 4;
for (i in 3:(ncol(topo10)-1)) for (j in (i+1):ncol(topo10)) {
    rr <- cor(topo10[,i],topo10[,j],use='pair')
    if (abs(rr)>sqrt(0.5)) print(c(names(topo10)[c(i,j)],round(rr,3)))
}
## [1] "slope"  "rad355" "-0.746"
## [1] "rad080" "rad172" "0.952" 
## [1] "rad080" "thl0"   "0.827" 
## [1] "rad080" "thl337" "0.78"  
## [1] "rad172" "thl0"   "0.723" 
## [1] "thl0"   "thl315" "0.803" 
## [1] "thl0"   "thl337" "0.951" 
## [1] "thl315" "thl337" "0.946" 
## [1] "plow050" "plow125" "0.814"  
## [1] "plow125" "plow250" "0.917"  
## [1] "plow250" "plow500" "0.847"  
## [1] "plow250" "tpi250"  "0.755"  
## [1] "plow500" "tpi250"  "0.708"  
## [1] "plow500" "tpi500"  "0.783"  
## [1] "tpi050" "tpi125" "0.839" 
## [1] "tpi050" "tpi250" "0.737" 
## [1] "tpi125" "tpi250" "0.93"  
## [1] "tpi125" "tpi500" "0.762" 
## [1] "tpi250" "tpi500" "0.897"
# 
# # distance from Atlantic and False Bay, borderline
# # Ran some models swapping these two variables in and out, and found false bay on average increased R2 more, and overall model R2 are significantly correlated so when one works, either works overall. Based on this, all models use Distance to False Bay.
# cor(topo10[,c('d2at','d2fb')],use='pair') # R = -0.65
# 
# # equinox and winter solstice radiation
# cor(topo10[,c('rad080','rad172')],use='pair') # R = 0.95
# 
# # all tpi variables with each other across scales, and plow with each other across scales; elevation weakly correlated with tpi and plow
# cor(topo10[,c('elevation','tpi050','tpi125','tpi250','tpi500','plow050','plow125','plow250','plow500')],use='pair') # R >= 0.89
# cor(topo30[,c('elevation','tpi050','tpi125','tpi250','tpi500','plow050','plow125','plow250','plow500')],use='pair') # R >= 0.89
# 

How does the Topographic Heat Load index, calculated with different aspect orientations, compare to daily radiation load from a solar model? Maximum values are near the equinoxes, with a minimum at summer solstice and a dip at winter solstice. thl315 has lower correlation as it is furthest from north-facing, so may work for temperature due to hotter afternoon air temp, but is not as strongly correlated with solar radiation per se.

# # How does daily solar radiation compare to topographic heat load
# # correlation max at 0.7 on day 48 (Feb 17) and around day 295 (Oct 21)
# # minimum mid-summer solstice
# head(dlySol)
# head(topo10)
solVthl <- c()
i=1
for (i in 1:366) solVthl[i] <- cor(topo10$thl0,dlySol[,i],use='pair')
plot(1:366,solVthl,type='l')   

for (i in 1:366) solVthl[i] <- cor(topo10$thl315,dlySol[,i],use='pair')
lines(1:366,solVthl,col='red')

for (i in 1:366) solVthl[i] <- cor(topo10$thl337,dlySol[,i],use='pair')
lines(1:366,solVthl,col='blue')   

The following shows scatterplots of pairwise differences matrices for spatial distance vs. each environmental factor, and the accompanying Mantel test. The plots that are commented out are not significant for spatial correlation, and plots that are significant are illustrated. The random string in the box is an identifier to link this code to the results in the manuscript. Site 32 (Kirstenbosch Race Track) is removed from analysis.

As expected, distance from ocean variables are strongly spatially correlated. This raises questions about their interpretation in the models, but they are generally their for the purpose of capturing spatial gradients, so that could be okay. Elevation and slope are also spatially structured, though the scatterplots suggests less concern. Exposure and hillslope positions are not correlated, indicating we did a good job stratifying across the region.

# 
# #--------------------------------------------#
# # SPATIAL VARIATION IN TOPOGRAPHIC VARIABLES #
# # a6STycTJVjqM8vRd8V8h                       #
# #--------------------------------------------#
library(ade4)
# check matrices are aligned
all(meta$siteID==topo10$siteID)
## [1] TRUE
#head(meta)
sdist <- dist(meta[-32,c('UTM.east','UTM.north')])
hist(sdist,main='Histogram of distances between sites (m)')

selev <- dist(meta$elevation[-32])
plot(sdist,selev)

mantel.rtest(sdist,selev,nrepet=999)
## Warning in is.euclid(m2): Zero distance(s)
## Monte-Carlo test
## Observation: 0.2976591 
## Call: mantel.rtest(m1 = sdist, m2 = selev, nrepet = 999)
## Based on 999 replicates
## Simulated p-value: 0.001
sd2at <- dist(topo10$d2at[-32])
plot(sdist,sd2at)

mantel.rtest(sdist,sd2at,nrepet=999)
## Monte-Carlo test
## Observation: 0.4048515 
## Call: mantel.rtest(m1 = sdist, m2 = sd2at, nrepet = 999)
## Based on 999 replicates
## Simulated p-value: 0.001
sd2fb <- dist(topo10$d2fb[-32])
plot(sdist,sd2fb)

mantel.rtest(sdist,sd2fb,nrepet=999)
## Monte-Carlo test
## Observation: 0.9611594 
## Call: mantel.rtest(m1 = sdist, m2 = sd2fb, nrepet = 999)
## Based on 999 replicates
## Simulated p-value: 0.001
plot(selev,sd2fb)

mantel.rtest(selev,sd2fb,nrepet=999)
## Warning in is.euclid(m1): Zero distance(s)
## Warning in is.euclid(distmat): Zero distance(s)
## Monte-Carlo test
## Observation: 0.2783594 
## Call: mantel.rtest(m1 = selev, m2 = sd2fb, nrepet = 999)
## Based on 999 replicates
## Simulated p-value: 0.001
sd2cs <- dist(topo10$d2cs[-32])
plot(sdist,sd2cs)

mantel.rtest(sdist,sd2cs,nrepet=999)
## Monte-Carlo test
## Observation: 0.1044175 
## Call: mantel.rtest(m1 = sdist, m2 = sd2cs, nrepet = 999)
## Based on 999 replicates
## Simulated p-value: 0.003
dslope <- dist(topo10$slope[-32])
plot(sdist,dslope)

mantel.rtest(sdist,dslope,nrepet=999)
## Monte-Carlo test
## Observation: 0.0561465 
## Call: mantel.rtest(m1 = sdist, m2 = dslope, nrepet = 999)
## Based on 999 replicates
## Simulated p-value: 0.018
dtpi50 <- dist(topo10$tpi050[-32])
#plot(sdist,dtpi50)
mantel.rtest(sdist,dtpi50,nrepet=999)
## Monte-Carlo test
## Observation: 0.02425777 
## Call: mantel.rtest(m1 = sdist, m2 = dtpi50, nrepet = 999)
## Based on 999 replicates
## Simulated p-value: 0.176
drad080 <- dist(topo10$rad080[-32])
#plot(sdist,drad080)
mantel.rtest(sdist,drad080,nrepet=999)
## Monte-Carlo test
## Observation: 0.01535659 
## Call: mantel.rtest(m1 = sdist, m2 = drad080, nrepet = 999)
## Based on 999 replicates
## Simulated p-value: 0.196
dthl <- dist(topo10$thl0[-32])
#plot(sdist,dthl)
mantel.rtest(sdist,dthl,nrepet=999)
## Monte-Carlo test
## Observation: 0.01342669 
## Call: mantel.rtest(m1 = sdist, m2 = dthl, nrepet = 999)
## Based on 999 replicates
## Simulated p-value: 0.26
# #-------------------------------------------------#
# # MAKE 3 CHAR DAYS VAR, for dlySol variable names #
# #-------------------------------------------------#
days <- as.character(1:366)
days[nchar(days)==1] <- paste('00',days[nchar(days)==1],sep='')
days[nchar(days)==2] <- paste('0',days[nchar(days)==2],sep='')

Okay, the modeling strategy is a bit brute force now! First set up four variables that have alternative terms for equivalent topographic variables: radterm = solar radiation; hillterm = topographic hillslope position; regterm = regional position relative to ocean; slterm = slope (only one variable). Plus there’s elevation in every model. Then set up modterms and modnums which contain the names and numbers of each item for all 120 combinations (5 radterm * 8 hillterm * 3 regterm). And finally set up a bunch of lists which will have four items each, holding results for Tmin, Tmax, RHsat.hrs, and VPmax; each item in the list will be a 366 row matrix with model results for each day

#-------------------------------#
# DAILY WEATHER MODELS          #
# OHkNt68RXvbTk4jHfczb          #
#-------------------------------#

# Run through daily models testing all combinations of elevation, radiation, hillslope, and regional position terms
radterm <- c('dsol','rad080','rad172','rad355','thl315','thl337','thl0')
hillterm <- c('tpi050','tpi125','tpi250','tpi500','plow050','plow125','plow250','plow500')
regterm <- c('d2fb','d2at','d2cs')
slterm <- 'slope'
nModels <- length(radterm)*length(hillterm)*length(regterm)

modterms <- matrix(NA,nModels,5)
modnums <- matrix(NA,nModels,5)
n <- 0
for (i in 1:length(radterm)) for (j in 1:length(hillterm)) for (k in 1:length(regterm)) {
    n <- n+1
    modterms[n,] <- c('elevation',radterm[i],hillterm[j],regterm[k],slterm)
    modnums[n,] <- c(1,i,j,k,1)
}

## running all possible models on daily basis for four dependent parameters: Tmin, Tmax, RHsat.hrs, VPmax
RSQelev <- list() # r-squared, elevation only model
MSEelev <- list() # mean square error, elevation only model
RSQm5x <- list() # highest r-squared, 5 term model
MSEm5x <- list() # mean square error for max r-squared 5 term model
BFm5x <- list() # best-fit model: number from 1:120 indicating which had highest r-squared
RSQm5 <- list() # matrix of 120 r-squared for all 5 term models
MSEm5 <- list() # matrix of 120 mean square errors for all 5 term models

vars <- c('Tmin','Tmax','RHsat.hrs','VPmax')
for (v in 1:length(vars)) {
    RSQm5[[v]] <- matrix(NA,366,ncol=nModels)
    MSEm5[[v]] <- matrix(NA,366,ncol=nModels)
    RSQelev[[v]] <- rep(NA,366)
    MSEelev[[v]] <- rep(NA,366)
    RSQm5x[[v]] <- rep(NA,366)
    MSEm5x[[v]] <- rep(NA,366)
    BFm5x[[v]] <- rep(NA,366)
}

Now, run through the four weather variables (v loop) and the 366 days (d loop). Each time through, subset the weather data, and insert the appropriate daily solar radiation from the dlySol data.frame into topo10$dsol. Fit one model with elevation only, and record results. Then fit all 120 possible 5 terms models (i,j,k, loop), saving r-squared and MSE. After running all of them, extract best fit and corresponding r-sq and MSE.

v <- 1
for (v in 1:length(vars)) {
    d <- 1
    for (d in 1:366) {
        #print(c(v,d))
        dd <- subset(dw,dw$doy==d)
        dd <- dd[match(dd$siteID,meta$siteID),]
        dd$yvar <- dd[,vars[v]]
        dd$yvar[meta$use4Analyses==0] <- NA
        topo10$dsol <- dlySol[,paste('rad_tot_',days[d],sep='')]
        
        #  Base model with elevation only
        fit1 <- lm(dd$yvar~elevation,data=topo10)
        RSQelev[[v]][d] <- summary(fit1)$r.sq
        MSEelev[[v]][d] <- sd(fit1$residuals,na.rm=T)
        
        # run all 5 parameter models
        n <- 0
        i=1;j=1;k=1
        for (i in 1:length(radterm)) for (j in 1:length(hillterm)) for (k in 1:length(regterm)) {
            n <- n+1
            #print(c(d,i,j,k,n))

            fit2 <- lm(dd$yvar~topo10[,'elevation']+topo10[,radterm[i]]+topo10[,hillterm[j]]+topo10[,regterm[k]]+topo10[,slterm])
            RSQm5[[v]][d,n] <- summary(fit2)$r.sq
            MSEm5[[v]][d,n] <- sd(fit2$residuals,na.rm=T)
        }
    }
    
    ## now extract rsq and mse for the best model, and record id # of best model
    BFm5x[[v]] <- apply(RSQm5[[v]],1,which.max)
    for (d in 1:366) {
        RSQm5x[[v]][d] <- RSQm5[[v]][d,BFm5x[[v]][d]]
        MSEm5x[[v]][d] <- MSEm5[[v]][d,BFm5x[[v]][d]]
    }
}

Results

Now, let’s look at the results! First, how do Rsq for the elevation and the best of the 5-term models vary through the year, for each variable? Interestingly, for all four variables results fluctuate a lot day to day, especially based on elevation alone. Using 5 variables, r-squareds are mostly between 40 and 90%, with lower values for: Tmin in spring and fall, Tmax in summer, RHsat.hrs in winter, and VPmax in summer.

op=par(mfrow=c(2,2))
i=1
for (i in 1:4) {
    plot(1:366,RSQelev[[i]],type='l',col='red',ylim=c(0,1),main=vars[i],xlab='DOY',ylab='r-squared')
    points(1:366,RSQm5x[[i]],type='l',lwd=2)
}

par(op)

Interestingly, there are only weak correlations between model fit across variables on a given day. The strongest correlation is for the 5-term models for Tmin and Tmax (var 1 vs var 2 in second plot below):

pairs(cbind(RSQelev[[1]],RSQelev[[2]],RSQelev[[3]],RSQelev[[4]]))

pairs(cbind(RSQm5x[[1]],RSQm5x[[2]],RSQm5x[[3]],RSQm5x[[4]]))

My main goal has been trying to make sense of variation in the daily r-squared variables, and next step would be to sort out the meaning of which terms contribute to the best model. Presumably many of the r-squared values are effectively indistinguishable, but that would require quite a lot of examination.

Here are plots of the max r-squared for each variable versus the daily mean for that variable.

Tmin - nothing doing!

plot(dlySummary$Tmin,RSQm5x[[1]],ylab='Max r-squared, Tmin')

Tmax - r-squared are higher on cooler days, i.e. topography is more predictive of spatial variation in Tmax

plot(dlySummary$Tmax,RSQm5x[[2]],ylab='Max r-squared, Tmax')

Tmax variation is also much better explained on days with a low Tmin-Tmax difference, which would presumably be humid and/or cloudy:

plot(dlySummary$Tmax-dlySummary$Tmin,RSQm5x[[2]],ylab='Max r-squared, Tmax',xlab='Tmax-Tmin')

That’s interesting, as I think of the coolest days as often being clear with a potentially high Tmin-Tmax difference. Just a quick check on how those two are related:

plot(dlySummary$Tmax,dlySummary$Tmax-dlySummary$Tmin,xlab='Tmax',ylab='Tmax-Tmin')

plot(dlySummary$Tmin,dlySummary$Tmax,xlab='Tmin',ylab='Tmax')
abline(0,1)

RHsat.hrs - r-squared are higher on days with intermediate number of hours of fog - perhaps these days have the most variation among stations to explain, since clear (=0) and all foggy (=24), there’s not much variation to work with

plot(dlySummary$RHsat.hrs,RSQm5x[[3]],ylab='Max r-squared, RHsat.hrs')

VPmax - not much going on

plot(dlySummary$VPmax,RSQm5x[[4]],ylab='Max r-squared, VPmax')

Inspection of Kirstenbosch weather variables didn’t add much here.

There’s also not much obvious variation among SOM classes from Jasper’s regional weather pattern analysis:

plot(dlySummary$sommin,RSQm5x[[1]],ylab='Max r-squared, Tmin',xlab='Tmin regional SOM')

plot(dlySummary$sommax,RSQm5x[[2]],ylab='Max r-squared, Tmax',xlab='Tmax regional SOM')

# t1 <- table(BFm5x[[1]])
# modterms[as.numeric(names(t1)[which.max(t1)]),]
# 
# t1 <- table(BFm5x[[2]])
# modterms[as.numeric(names(t1)[which.max(t1)]),]
# 
# t1 <- table(BFm5x[[3]])
# modterms[as.numeric(names(t1)[which.max(t1)]),]
# 
# t1 <- table(BFm5x[[4]])
# modterms[as.numeric(names(t1)[which.max(t1)]),]

# modterms[46,]
# modnums[46,]
# plot(modnums[BFm5x[[2]],3])
# 
# plot(RSQelev[[1]],RSQm5x[[1]])
# plot(RSQelev[[2]],RSQm5x[[2]])
# plot(RSQelev[[3]],RSQm5x[[3]])
# plot(RSQelev[[4]],RSQm5x[[4]])
# 
# plot(MSEelev[[1]],MSEm5x[[1]]);abline(0,1)
# plot(MSEelev[[2]],MSEm5x[[2]]);abline(0,1)
# plot(MSEelev[[3]],MSEm5x[[3]]);abline(0,1)
# plot(MSEelev[[4]],MSEm5x[[4]]);abline(0,1)
# 
# head(dlySummary)
# boxplot(MSEm5x[[2]]~dlySummary$sommax)
# boxplot(MSEm5x[[1]]~dlySummary$sommin)
# 
# plot(RSQm5x[[1]]~dlySummary$Kps.mean_hPa)
# plot(RSQm5x[[2]]~dlySummary$Kps.mean_hPa)
# plot(RSQm5x[[3]]~dlySummary$Kps.mean_hPa)
# plot(RSQm5x[[4]]~dlySummary$Kps.mean_hPa)
# 
# plot(RSQm5x[[1]]~dlySummary$Kwspd.mean_m.s)
# plot(RSQm5x[[2]]~dlySummary$Kwspd.mean_m.s)
# plot(RSQm5x[[3]]~dlySummary$Kwspd.mean_m.s)
# plot(RSQm5x[[4]]~dlySummary$Kwspd.mean_m.s)
# 
# plot(RSQm5x[[1]]~dlySummary$Tmin)
# plot(RSQm5x[[2]]~dlySummary$Tmax)
# plot(RSQm5x[[3]]~dlySummary$RHsat.hrs)
# plot(RSQm5x[[4]]~dlySummary$VPmax)
# 
# plot(RSQm5x[[1]]~I(dlySummary$Tmax - dlySummary$Tmin))
# plot(RSQm5x[[2]]~I(dlySummary$Tmax - dlySummary$Tmin))
# plot(RSQm5x[[3]]~I(dlySummary$Tmax - dlySummary$Tmin))
# plot(RSQm5x[[4]]~I(dlySummary$Tmax - dlySummary$Tmin))
# 
# plot(I(dlySummary$Tmax - dlySummary$Tmin)~dlySummary$Tmin)
# plot(I(dlySummary$Tmax - dlySummary$Tmin)~dlySummary$Tmax)
# plot(I(dlySummary$Tmax - dlySummary$Tmin)~dlySummary$RHsat.hrs)
# plot(I(dlySummary$Tmax - dlySummary$Tmin)~dlySummary$VPmax)